m2a <-lm(data = FirstYearGPA, GPA ~ Male + FirstGen + White + HSGPA_C)summary(m2a)
Call:
lm(formula = GPA ~ Male + FirstGen + White + HSGPA_C, data = FirstYearGPA)
Residuals:
Min 1Q Median 3Q Max
-1.05776 -0.25735 0.04178 0.28247 0.74665
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.88512 0.06660 43.321 < 2e-16 ***
Male 0.06194 0.05416 1.144 0.254080
FirstGen -0.18508 0.08709 -2.125 0.034719 *
White 0.25738 0.06798 3.786 0.000199 ***
HSGPA_C 0.55967 0.07220 7.751 3.63e-13 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.3963 on 214 degrees of freedom
Multiple R-squared: 0.2884, Adjusted R-squared: 0.2751
F-statistic: 21.68 on 4 and 214 DF, p-value: 4.96e-15
vif(m2a)
Male FirstGen White HSGPA_C
1.017807 1.069368 1.068992 1.016413
vcov(m2a)
(Intercept) Male FirstGen White
(Intercept) 0.0044353517 -0.0012011789 -0.0020911751 -0.0036963547
Male -0.0012011789 0.0029337643 0.0002489512 -0.0002451420
FirstGen -0.0020911751 0.0002489512 0.0075845739 0.0014043929
White -0.0036963547 -0.0002451420 0.0014043929 0.0046207882
HSGPA_C 0.0001548344 0.0003521719 -0.0004544480 -0.0003379714
HSGPA_C
(Intercept) 0.0001548344
Male 0.0003521719
FirstGen -0.0004544480
White -0.0003379714
HSGPA_C 0.0052135130
What are the log-likelihoods of each model? What can you say about the models based on those values? What can’t you say?
logLik(m1)
'log Lik.' -132.6243 (df=5)
logLik(m2)
'log Lik.' -105.5292 (df=6)
logLik(m2a)
'log Lik.' -105.5292 (df=6)
You cannot say anything about the individual likelihoods, only compare them. Likelihood (/log likelihood) are relative measures, unlike probability.
Model 2/2a has a higher (closer to 0) log likelihood than model 1.
Centering HSGPA doesn’t change the log likelihood.
Compare the two models using a likelihood ratio test (LRT). Report the results of the test. What can you say about the models based on the test?
anova(m1, m2, test ="LRT")
Analysis of Variance Table
Model 1: GPA ~ Male + FirstGen + White
Model 2: GPA ~ Male + FirstGen + White + HSGPA
Res.Df RSS Df Sum of Sq Pr(>Chi)
1 215 43.051
2 214 33.614 1 9.4369 9.108e-15 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
-2*(logLik(m1) -logLik(m2))
'log Lik.' 54.1902 (df=5)
Model 2 is significantly better than model 1: \chi^2(1) = 54.1902, p<.0001.
Adding HSGPA to a model with Male, FirstGen, and White significantly improves the model.
Report the results for the better model (based on the LRT). Include all regression coefficients, R^2, test statistics, p-values.
Intercept: b_0 = 2.885, t(214) = 43.321, p<.0001
The expected first year college GPA for a person who is female, not a first generation college student, not white, and who has mean high school GPA is 2.885.
Male: b_1 = 0.062, t(214) = 1.144, p>.05
Male students have a first year college GPA that is 0.062 points higher than female students, holding all other predictors constant. This effect is not significantly different from 0.
FirstGen: b_2 = -0.185, t(214) = -2.125, p<.05
First generation college students have a first year college GPA that is 0.185 points lower than that of non-first generation students, holding all other predictors constant; this effect is significant.
White: b_3 = 0.25738, t(214) = 3.786, p<.001
White students have a first year college GPA that is 0.257 points higher than their non-white classmates, holding all other predictors constant; this effect is significant.
HSGPA_C: b_4 = 0.55967, t(214) = 7.751, p<.0001
Each 1 unit increase in high school GPA corresponds to a 0.560 point higher first year college GPA, holding all other predictors constant. This effect is significant.
The four predictors in this model account for 28.84% of the variance in the outcome, first year college GPA.
Source Code
---title: "BTS 510 Lab 9"format: html: embed-resources: true self-contained-math: true html-math-method: katex number-sections: true toc: true code-tools: true code-block-bg: true code-block-border-left: "#31BAE9"---```{r}#| label: setupset.seed(12345)library(tidyverse)library(Stat2Data)theme_set(theme_classic(base_size =16))```## Learning objectives* **Describe** *maximum likelihood estimation* for linear regression * **Describe** *hypothesis testing* for linear regression coefficients, including the sampling distribution used and degrees of freedom## Data * `FirstYearGPA` data from the **Stat2Data** package: $n$ = 219 subjects * `GPA`: First-year college GPA on a 0.0 to 4.0 scale * `HSGPA`: High school GPA on a 0.0 to 4.0 scale * `SATV`: Verbal/critical reading SAT score * `SATM`: Math SAT score * `Male`: 1= male, 0= female * `HU`: Number of credit hours earned in humanities courses in high school * `SS`: Number of credit hours earned in social science courses in high school * `FirstGen`: 1= student is the first in her or his family to attend college, 0=otherwise * `White`: 1= white students, 0= others * `CollegeBound`: 1=attended a high school where >=50% students intended to go on to college, 0=otherwise## TasksQuestion 1: How do demographic variables (`Male`, `FirstGen` and `White`) predict first year college GPA (`GPA`)?Question 2: How does HS GPA (`HSGPA`) predict first year college GPA (`GPA`) over demographic variables (`Male`, `FirstGen` and `White`)?1. Run the two models above.```{r}library(car)library(Stat2Data)data(FirstYearGPA)head(FirstYearGPA)m1 <-lm(data = FirstYearGPA, GPA ~ Male + FirstGen + White)summary(m1)vif(m1)vcov(m1)m2 <-lm(data = FirstYearGPA, GPA ~ Male + FirstGen + White + HSGPA)summary(m2)vif(m2)vcov(m2)```* Mean center `HSGPA` to make intercept more interpretable```{r}FirstYearGPA <- FirstYearGPA %>%mutate(HSGPA_C = HSGPA -mean(HSGPA, na.rm =TRUE))mean(FirstYearGPA$HSGPA, na.rm =TRUE)m2a <-lm(data = FirstYearGPA, GPA ~ Male + FirstGen + White + HSGPA_C)summary(m2a)vif(m2a)vcov(m2a)```2. What are the log-likelihoods of each model? What can you say about the models based on those values? What *can't* you say?```{r}logLik(m1)logLik(m2)logLik(m2a)```* You cannot say anything about the individual likelihoods, only compare them. Likelihood (/log likelihood) are **relative** measures, unlike probability.* Model 2/2a has a higher (closer to 0) log likelihood than model 1.* Centering `HSGPA` doesn't change the log likelihood.3. Compare the two models using a likelihood ratio test (LRT). Report the results of the test. What can you say about the models based on the test?```{r}anova(m1, m2, test ="LRT")-2*(logLik(m1) -logLik(m2))```* Model 2 is significantly better than model 1: $\chi^2(1) = 54.1902, p<.0001$.* Adding `HSGPA` to a model with `Male`, `FirstGen`, and `White` significantly improves the model.4. Report the results for the better model (based on the LRT). Include all regression coefficients, $R^2$, test statistics, $p$-values.* Intercept: $b_0 = 2.885, t(214) = 43.321, p<.0001$ * The expected first year college GPA for a person who is **female**, **not** a first generation college student, **not** white, and who has mean high school GPA is 2.885.* `Male`: $b_1 = 0.062, t(214) = 1.144, p>.05$ * Male students have a first year college GPA that is 0.062 points *higher* than female students, *holding all other predictors constant*. This effect is not significantly different from 0.* `FirstGen`: $b_2 = -0.185, t(214) = -2.125, p<.05$ * First generation college students have a first year college GPA that is 0.185 points *lower* than that of non-first generation students, *holding all other predictors constant*; this effect is significant.* `White`: $b_3 = 0.25738, t(214) = 3.786, p<.001$ * White students have a first year college GPA that is 0.257 points *higher* than their non-white classmates, *holding all other predictors constant*; this effect is significant.* `HSGPA_C`: $b_4 = 0.55967, t(214) = 7.751, p<.0001$ * Each 1 unit increase in high school GPA corresponds to a 0.560 point *higher* first year college GPA, *holding all other predictors constant*. This effect is significant.* $R^2_{multiple} = 0.2884, F(4, 214) = 21.68, p<.0001$ * The four predictors in this model account for 28.84% of the variance in the outcome, first year college GPA.